 So my name is Martin, I'm with Debrit, which is a company that is currently owned by OpenText. And what we do at Debrit is that we are offering an SEA tool, that's software composition analysis. So the goal here is to help customers and organizations to use their open source components securely. So that means like fixing vulnerabilities, making sure they're not violating any licenses and being compliant in that part. But we don't only want to help organizations to use the open source securely. We also want to help them to choose the open source securely. So that already from the beginning, when they are choosing the open source components, we can make sure that they're not running into troubles in the future. We can minimize the risk of running into troubles in the future when you're choosing these open source components. And this last part is what I'm going to focus on in this talk. So let us just look at the problem. So this is an actual problem that we had in our own organization. So we currently use open source software called Pony as an ORM. So an ORM, if you don't know, it is a middleware that you can use when you talk to a database. When you're using a database, you can use an ORM software to do it. But when you look at alternatives, it seems to us that SQL Alchemy, which is an alternative ORM, seems to be more popular. So when looking at how popular it is, how many are using this software, and what the community around it looks like, it seems that if we could go back in time and choose another software or choose this software from the beginning, we will probably have chosen SQL Alchemy instead of Pony. Now I don't mean to say anything bad about Pony in this case. We are using it, we are continuing to use it, and we are quite happy with it. But if you look at the community and how popular the two are, when you compare them, it seems that the SQL Alchemy is more popular. So we will probably have chosen that one if we could go back in time. Now we just remove this one. The problem for us is that now we have about 500,000 lines of code in a particular part of our software, and 30,000 of those lines of codes are using Pony for interacting with the database. So if we would want to change, it would be a heavy burden for us. We could put a lot of resources into changing this now. So in some sense it's a little bit late for us to change it. Another problem that we have run into is that we started using a software called Flask REST Plus for building a REST API. So we could use it for validating JSON objects, and we could also use it to create automatic API documentation. The problem here is that Flask REST Plus was at one point just completely abandoned as a project. So we were trying to contribute to this project from our side, but the problem was that we didn't get any response from the maintainer, and it turned out that the maintainer had just abandoned the project. And instead there was a fork called Flask REST X that in this case we were quite lucky that we could just change to Flask REST X in order to use that one instead. But the question here of course is if we could have detected this change early on. So this diminishing of this Flask REST Plus project early on and switched earlier. So if we want to generalize this problem, what we can say is that how can we know if an open source project is suitable for using before we start using it all together? So this is the problem that we're trying to solve here. So just a little bit of background. So if we go back a few decades, the waterfall model for software development was a very popular model. We are kind of using a variant of that also today, but it's much more rapid and we're doing things over and over and doing things in cycles. But basically what we did in the waterfall model is that we started off with the system requirements, we did analysis, we did the sign of the software, we started coding, and then we started to do the testing of the project. And then we went into operation and maintenance. The idea here that was rising in the test community was to test earlier. So try to test as early as possible and this then became shift left. So this started within the testing community. The good thing about shifting software testing to the left in this waterfall model was that if you do this late, the problem is that if there are defects that you do find, you do find them only after significant resources have been spent on the requirements and on the design of the software already. So debugging will become more difficult because there's much more software to debug. And there will also be less time to fix the defects that you find in the testing phase because if you do this late, there might be just weeks or so before you have to ship the software. So some of the software that you ship is buggy and there will be a lot of technical depth. Instead, if you start shifting left and you test earlier instead, there will be less resources that are wasted on redoing things. So it will be easier and much faster to fix the defects since the code relies, there is less code that relies on the buggy part. So you can do this much faster and more defects can be fixed. Now if we turn to security, security is not everything is not the testing here but security is a part of testing. So for example, when you do static code analysis, you want to test immediately when the code is being written. So maybe already in the IDE where you develop the code, you want to do the testing as soon as possible. So that is shifting left for static security testing. For the dynamic testing, you want to test it as soon as you can run it because you need a running environment in order to test it dynamically but as soon as you can run it, you want to test as soon as possible in order to find the mistakes and the problems as early as possible. Same thing for the software composition analysis where you look at open source component and third party software, you want to test them as soon as they are included. Also for container scanning, shifting left here means that you want to scan as soon as you have built the containers. And if you want to test your infrastructure as a code, you want to test it immediately after the code has been written. And same thing for penetration testing where you want to test as early as possible. So all these things could be seen as shifting left for security. So another way of seeing this is, and you shouldn't take this graph here as something scientific, but it's just a depiction of the actual idea behind shifting left in security. So traditionally we did a lot of the testing rather close to the end of the cycle, even though we were doing testing early on as well. So not everyone was done at the late phase. But now with the shift left approach, the idea here is now to take as much as possible and move it as further to the left as possible in this line of what we're doing. So the waterfall model depiction here in some sense. This doesn't mean that we are not doing things later on. We're just trying to do it as soon as possible. So with that being said, shift right has also risen as a popular word here because that in some sense also makes sense. Because when you want to test something in an environment, you want to do it as much as you can in a live environment. If you want to do dynamic testing, it's in some cases better to do it in a live environment than for example, do it in a staging environment because there will be different usage and there will be more usage in a live environment. So some parts of the testing needs to be done in the live environment and that will then be shifting to the right instead in this model. Still everything here is an integrated part of the development process and some examples of shifting right is for example, that you want to do pen testing and back bounty programs, back bounty programs you kind of have to do in a live environment because that is what you're paying the researchers to do. If you want to do vulnerability scanning, sometimes some vulnerabilities are only visible in a live environment and in the usage that you have in a live environment and there are also several other variants here that you can see for example, log analysis, you have to do when you have logs, you also want to do this quite late in this process. So what we're trying to do today or this is more a representative picture of what we're trying to do today, this is the DevOps or the DevSecOps where we have rapid development so we're kind of doing similar things as we did in the waterfall model but we are doing it much more rapidly and we're doing it much more often and we're much shorter cycles as well. So we plan, we code, we build, we test, we release, deploy, operate the monitor and then we go over again and we do this as quickly as possible. And what you can see here if you look at this DevOps chart, you do not see any security here at all and this is in some sense a good thing. We don't have security in some sense but it's because we don't write it down and the idea is that we have security everywhere. So we try to do security testing everywhere throughout this DevOps process. So when we start talking about this holistic approach to security, we also talk about shifting everywhere and this also ties very well with the DevSecOps process that many people are using today. But the important thing here to note is that when we are doing this very rapidly, we come back to testing and deployment and coding all over again very quickly, the testing needs to be automated because we cannot do this manually when we do it so often. So it is very crucial to have automatic tools and automatic testing in place. Now moving to open source and security testing of open source. Let us look at the life cycle here. So first thing you have to do when you pick an open source component is that you evaluate the component and you select the component. After that, you start doing dependency scanning. So dependency scanning you do typically with the software composition analysis tool. And you do that during development. You do it when you're building and testing. You can do it also during maintenance in order to see what new security vulnerabilities have been found in the deployed software. And then you also do static and dynamic code analysis. Sometimes you do it also for open source that you take in. Sometimes that is out of scope depending on your organization and the strategy you're having in your organization. And then you do pen testing in an environment. And then of course the open source components are part of the pen testing here. And then you also have incidence response and patch management. And what I want to do here is now to narrow down this security testing for open source to only focus on the first part. So what can we do in terms of testing when we evaluate and we select the open source components? So when we evaluate and select, this is a simple picture of what it could look like in an organization when you try to select an open source component. So what you do in the beginning is to define policies for open source usage. So what rules do we have in our organization when we use open source? So we have some minimum requirements for the open source that we take in. And then we define policies for that. And then in parallel to this, of course we need to define the needs and the functionality. So what kind of software do we need? What functionality does this software have? And can I even use it in my software? And then when we have those needs and the policies, we can evaluate open source software components against these policies and needs. And when we have done this evaluation, we can now decide on a component to use. So this is a cartoon that probably many of you have seen already. At some point this has become very famous specifically in the open source community. The idea here is to depict the situation where you build a pretty complicated piece of software and somewhere down the road, this software is using one small open source component that is possibly maintained by only one person and this open source component is very important for the project and what will happen if this open source component will stop being maintained, for example. How will you handle that situation? And this is kind of the situation that we're trying to help resolve here. If you happen to choose the wrong open source component, that could be many negative implications of this. So for example, for security and compliance, some implications could be that you will have security vulnerabilities that are not being fixed, that could be legal issues, that could be licensing risks that you have when choosing the wrong component and that could also be challenging when it comes to compliances. For example, if you are in the credit card industry, you have specific compliance to meet. If you do fintech, if you do medical devices, that would be specific compliances that you need to meet. Also for performance and for technical issues, that could be compatibility issues. If you want to build out your software later on, you may have problems when you are choosing an open source component that cannot be extendable in a way that matches your wish for your software. And performance degradation, you can run into technical debt. Also for operational maintenance, it could be a maintenance burden if you have to, as an organization, if you have to start contributing, if you have to start maybe taking responsibility for this component in order to do this, this is of course possible and this has happened, but it will be a maintenance burden that could for some organizations be quite a heavy burden. It could also have some limited features and flexibility if you want to extend on this component later on. And also for organization, you could have reputational and user experience damage. So the trust in your organization, if there is a problem with the software that you're using, of course, it's a big issue for many organizations and also for the user experience that can be degradation. So if we have chosen the wrong component from the beginning, it could sometimes mean that we need to start all over again to choose a new component, evaluate that component and adapt our code and our software in order to use that component instead. And that starting over will cause a lot of project delays and that is not very popular in many organizations. So it could be a problem if we choose wrong, but the matter of fact is that we have to choose. Almost all organizations have to make the choice of open source components and if you look at some available data, around 60 to 80% of a code base is purely open source and the rest is then in-house code. And this will of course vary between different organization. If you could maybe go down to 13 in some cases, you could be at 95 in some other cases, but looking at the reports and the data that is out there, around 60 to 80% is open source. So you have to choose open source components and you have to do it over and over again when you extend your software. So what should we look for when we choose this component? Well, we can divide it into two different categories. So the first category is can I use the component altogether? So does it even do what I need, the specific component? Is it extendable? Is it easy to integrate with the software that we have? Which programming language is used? Does it have proper documentation that you'd be very critical if you need to use it and integrate it with your software? And is it efficient? You don't want a new component to be a bottleneck when you are delivering the software. The other question here, apart from can I use it, is should I use it? So should I use it under that? We have, for example, how is it licensed? Is it actively being used by others? And how many has used it before you? It is really good if you know that a lot of people, a lot of software has used this component before and you are not the first one to try it out. Is it currently supported? How many are maintaining this? Is there only one maintainer? We may run into a problem where this maintainer just abandoned the product. But if there are several maintainers, it puts more confidence in the fact that this might be continued to be maintained in the foreseeable future. Also, who is maintaining it? Is it very experienced developers that are maintaining it, or is it someone that is doing this maybe for the first time? And doing this for the first time may not be bad if this person is really good and ambitious, but that may not be the case and it will increase the risk. And how are security vulnerabilities handled? Are they fixed immediately or someone just leaving them for half a year and then start thinking about fixing security vulnerabilities? And how quickly are bugs fixed? So all of these things is something that you may want to consider when you start choosing your open source components. And many more than this, of course, it is just a short list of things that you may want to consider. The interesting part here from our side is how much of this can we automate? So we were talking before about automation of tests. So how much here can we also automate to relieve the burden from the people that have to assess the components? So let us try to streamline this selection process a little bit and define the policies that we want to have. So one policy can be based on hard values like this is a showstopper. So for example, a license here would be a showstopper. If the component is using a license such that we will not be able to comply with this license given how we are using this software, then we simply cannot use this software. So that would be just ruling that soft route. But there are also soft values. So soft values would be specifying what are the minimum security standards of the components that we take in that we are going to allow if we're going to take them in? What is the minimum level of the community activity around this open source component that we need to make sure is there in order for us to choose it? And also we need to consider how others have assessed this software before. Doing this as just snapshots, we can just take the data and see the snapshots right now, but that doesn't give us the full picture because if something is degrading and something is being less and less maintained, many of these things as a snapshot might look good, but if you start looking at the trends and start looking at the evolution over time for this specific software, you may find other data that could be really valuable to you. So the goal here that we have is two things. Basically, the first one is how can we compare different open source projects against each other when we are making the assessment, when we are making the evaluation if we are going to use this project or not? And the other question, which is a related question, is is this open source software good enough or should I write this myself? Should I write the functionality myself in the software at all? So just to get into the thinking, and this is just an example, but just to get into the mindset here, if you try to think about how many GitHub stars are considered good for a software project. So then you start thinking, well, maybe 100,000 is good. Well, maybe 10,000 is good. Maybe even 20 is good. So this is really difficult question to answer because this will really depend on what type of software that you are looking at. So for some type of software, 10,000 would be really, really bad. But for other types of software, 10,000 would be considered extremely good. And this is the difficult thing that we encounter when we try to do these things. So the assumption that we are making here is that if you want to compare software against each other, they have the same order of magnitude when it comes to the different metrics that we are looking at. So as an example, if you are looking for a front end framework in JavaScript, you may want to compare, here we have React, we have View, and we have Angular. And if you look at the number of stars here, we have 213,000 for React, 205,000 for View, and 90,000 for Angular. In this case, 200 stars will be considered pretty bad. But in another case, 200 stars might be considered really good if it is like a small task that this software has to do. So the task of the software might be to just validate the JSON objects. We'll just check, making some string validation. Then in that case, maybe a few hundred stars would be really good. So we have these dynamics here that we need to handle in some way. If we start looking at the distributions that we get, so we pull down all this data from Github, and we start looking at the distribution. What will the distribution be? Well, in very, very many cases, the distribution that we see is an exponential distribution with a very low mean that you see to the left here. The problem here is that if we want to give this a score between zero and 100, there will be a lot of them with a very low score and very many with a medium, or very few with a medium or high score. So what we do here is that we transform the distributions that we get into another distribution, which is basically, this is called a quantile transformation. So the concept here is rather simply. So we just generate the distribution that we want, which is the normal distribution in this case, and then we sort all the data that we have in ascending or descending order doesn't matter, and then we just put it into the normal distribution instead. So basically, we're taking the distribution to the left and we're mapping it into a normal distribution. The effect that this will have is that the difference between projects that have, if we just continue with the stars, in this case, the Github stars, the difference between, say, 10 and 20 stars would be approximately the same as the difference between, say, 100 and 200,000 stars, which would make it possible to compare these things. So we take the normal distribution that we have transformed it to, and then we map this to a score between zero and 100. So I'm going to take an example. I used Github stars before, but this is just one example of all the features that we can look at. So if you look at popularity features, we have categorized these in four different categories, and we're calling these categories for practices. And in these practices, we have features. So if you want to look at one practice, which is usage, here we have the feature total number of downloads and total number of forks that we have for the project. Then we look at the developer popularity. So under developer popularity, we have total number of stars, as I gave us one example previously. We have the total number of watchers for the project. We have the contributor influence, basically defined as how many people are following the contributors that are contributing to this project. And then we also look at the trend for the number of contributors. In community activity, we look at the recent commits and the trend for the commits. We do the same thing for the issues, the pull request and for closed issues. We just look both at the snapshot and at the trend for, say, the last six months. So we can see if something is going down and then we can flag that as something that is potentially becoming a problem. And then another practice that we have here is ecosystem bus. Then we combine all these practices into one metric. And this metric here is called popularity. And then we do this for three different metrics currently. So we do it for popularity, as I gave an example of now. But we also do this, and I will not go into the details there, but we also do this for the contributors that we have. So under the contributors metric, we have the practices, experience for the contributors, the efficiency, the diversity, the activity, the commitment of the core team. So the core team is here, those who can okay pull requests, for example. And then we also have contributor longevity. I mean, how long are the contributors staying with the product? And under these, we have also several features for each of these practices. And then for security, we do a similar thing, vulnerability response, vulnerability risk, coding best practices and bug reporting. And when we have all these features, so the features here, this is data that we can get from, for example, GitHub in this case. Then we weigh these features together in order to build the practices. And as you can see here, in this example, we are just weighing them in fairly. So if there are four different features that we have to the left here, every one of them will have like a 25% influence on the practice score. And then we do the same thing for all the features when we're building the practices. And then we combine the practices in the same way into a metric. We can do this in different ways. And now we do it in equal weight, but we can also think about, we can make this user determined. So one variant here would be that when we weigh in the features to build the practices that we do in the bottom here, what we can try to do is to maximize the signal that we have for the practice over all the open source component. So by maximizing the signal, we get as much information as possible when we're building our practices. And then possibly we can make it tweakable how you are then combining the practices into a metric because that would mean that the organizations and users can choose which practice is most important for them in their policy. So just to finish this, let us return to the problem that we have in the beginning of the talk. So we were choosing, or we used the Pony ORM, but we were thinking maybe we should have used the SQL Alchemy ORM instead. So by doing this and implementing this, we can see the data that we get here. So we have an overlay here. So look at the left picture to start with. Here we have an overlay with the dark dots being SQL Alchemy and the lighter dots being Pony. And here it is clear when we use this data and we combine this data into the metrics and the practices that we have that it seems definitely also when we quantify this that SQL Alchemy is the better choice when it comes to popularity. If you look at the right hand graph, we can also see that the SQL Alchemy is clearly the better choice when we look at contributor data as well. So this was not just a hunch that we had when we chose this and when we looked at this manually, but when we gathered the data and put it into these charts, we could actually see that our hunch was correct. And we could get this information already from the beginning if we had this tool that we could use. If you look further to the down, you see the popularity score that we get. So here we combine everything into a score between zero and 100. I'm not sure how clearly you can see it from the back, but the popularity for SQL Alchemy is 82 and the popularity for Pony is 61. So clearly higher for SQL Alchemy here. And also when you look at the contributors, the score is 70 for SQL Alchemy and for Pony it is 47. So it's also clear difference between these two projects. Again, I'm not saying that Pony is a bad software here in any way. It is just that pretty clear both from when we are looking at it manually and we are looking at the data, SQL Alchemy seems to be the more popular choice with a more active community. Also returning to the flask problem that we have. If you look at the chart on the top left, you will see the repository starts over time and for the light one, you have the flask rest plus which was the product that was abandoned and the replacement project is the dark line. You can clearly see that the number of stars is increasing but the increase is definitely declining when flask rest X was forked in January 2020 and it is rising much more clearly. In the top right graph, you will see the community sum of the closed and opened issues each month and here you will clearly see that the number of closed and open issues is completely flat for flask rest plus when rest X became the fork of it. So here it is really clear also that the trend is flat. So we could have seen this quite early with this data and also for the monthly commits, you can also see it for the light one in the left corner that for the flask rest plus, you have basically no commits at all from January 2020 and onwards. So here we could also detect this using this data, using this, doing this automatically very easily. And if you compare in the bottom right, just the popularity and the contributor score, there is not a very big difference here but it is clear that you have a higher score for flask rest X. So what we can do now finally is that we can take policies and we can define in our organization what is the minimum value for contributors, for popularity and also if you have security in order for us to take in this open source project. And here you can see in the tool that we're using that find the pointer here, you can see that if you take this open source project, you will have a failing pipeline. So we define a policy saying that if something specifically happens, or in this case I have said, we don't want any strong copy left licenses in our project which we have because it is TPL2 or later, if we do this we want to fail the pipeline. So here we can see already before we are choosing the project that your pipeline will fail depending on the policy that you put in your organization. So you should probably not even consider using this open source software. And then we also have another one that just, it doesn't fail the pipeline. In this case it will, you will not see it here but it will send an email to administrators if something is violating the policy. And you don't see the policy here but the policy here would be that the security score had to be above a certain value and it will fail here and it will rise an alarm to the administrators. And the second project here will have all passing policies in the pipeline. So in this case, and this is not all you should base it on but this gives you information, it gives you some signal regarding what will happen when you choose this project and what will happen to your policies? How does this fit your policies? So concluding, it is possible by doing this to extract very many features automatically from open source projects and we can use this as a basis for a policy that we want to set. A problem that we do run into that we have somewhat of a solution to with this contact transformation is that the actual distributions are not really straightforward to handle. The example I gave with exponential to the normal distribution, that was like, if it would have been this easy we would have been really happy. We still have to do some manual intervention here in order to make these distributions and meeting what you would expect as a user developer here. What you need to remember is that low metrics here they are not necessarily bad in the context of the type of software that you have. So you could have low metrics but still have quite a good software in the category of software that you're trying to use. And if you want to have more details regarding this and you want to have more details on the features and the practices that we use and how we use them and how we try to combine them I invite you to visit our documentation. You can also try this out if you want to by going to the brick.com slash select and you can just type in and search for any type any software that you like and you can see the metric and you can see the data that we have. I just want to warn you that this is still in an experimental phase. We're trying to improve this continuously and we're hoping to improve it more and more and we're really happy to get feedback from any one of you and if you have used it and if you see things that can be improved and I'm sure there are several things that can be improved here into such that we can maximize the value for organizations that are using this. And if you want to know more about this just come and see us in our booth downstairs and with that very much thank you for visiting this talk. Any questions? Yes? Yes, this is a very good question and this is of course a real problem. So what we're doing today is that we are really not categorizing the software so what we're just, the categories are based on programming language basically. So if you are JavaScript, you are measured in terms of JavaScript. If you are Python, you are measured in terms of Python so we're comparing with all the Python. And the result of this is of course that you will have a category that have very low scores because now we are comparing it to another category within the same programming language. So if you look at React for example which is like really a popular project and then you look at the smaller project they are not really comparable. So the numbers will not really be fair when you start comparing them. The thing is that we don't think that you are going to compare React with a small project. So we think that if you're comparing React with something you are comparing it with something that is similar and then the comparison will make sense. So that means that if you start looking at the metrics if you have a metric that is 95 versus 85 and they are in the same category you can probably say that the 95 is a pretty good product while the 85 is a less good project but 85 is still good. If you go to that category here you will have one product that you're looking at that has 45 and another product that has 35 and still I mean compared to the ones over here they are really bad both of them but it doesn't mean they have to be bad because in the context of your comparison over here the one with 45 will be better than the one with 35. So this will give you a problem if you say we don't want anything that is below 50 this will give you like this will give you a problem because you will pass all the small libraries will just go through. Yeah. So you mean you're downloading something that... Yes? How do you rank this kind of product? We're not, for licensing specifically we're not doing any ranking. So you just define the licenses on your side what licenses am I going to allow passing this assessment? So there is no comparison in that way like scoring licenses. Yeah. Yeah sure. You can say in the policy that I am not going to allow anything that is weak copy left and strong copy left and we will flag that for you. Well you can do it in categories you can do it in categories if you want but you can also name specific licenses. You can say anything MIT, BSD, Apache that will absolutely pass no questions asked. Or you can say anything that is permissive will pass no questions asked. So we have done the categorization into permissive of course in the background. No, no it is real time. So we are basing this on our own database which is a mirror of the GitHub database. So we are pooling, we are updating the GitHub database like not continuously but I mean very often and then we can pull this data from our own database. So we do it like it's... Yeah it's not that I don't want to it's because I probably don't have the exact date of the frequency but I would say it's more often than two weeks for sure. But it's not like every 10 minutes. So yeah. Yeah no no it is more often than once in a month but I mean I can get back to you with the exact update frequency if you want there's no problem with that. So yeah so this is a good question so false positive here. How would you define a false positive really because I mean we can make we're not making decisions we are making like statistical data based on statistical metrics based on data. So I mean you have to define a false positive. I mean depending on your definition of a false positive I am sure that we will have many false positives but it will probably not be the same false positives as someone else's definition of a false positive because this is kind of continuous data even though it's of course is discrete but it's... Yes so that would be a problem in the underlying data and that is of course something that we are working on continuously to make sure that the data that we are pooling is correct. So there are like two steps here so you have the data that we need to pool and we need to interpret the data and then we need to base metrics on this data. So that could of course be false positives in how we are pooling the data and how we are interpreting the data that we get. Yeah so in this we are currently not looking at the transitive dependencies so if you are choosing an open source software we give you metrics for that software. We are not looking at all the transitive dependencies that are used by that software which definitely is a limitation that will later be handled by the SEA component in itself of course but when you are choosing this currently we are not looking at transitive dependency. It will be a huge amount of data which we possibly can do in the future. Of course we have considered it here. All right thank you very much.