 Hello, welcome everybody, in our talk about the wise ways in open source compliance, taxonomy and the tools of open source compliance. Let us introduce ourselves. My name is Greg Etchatari and I'm working as a senior open source specialist in the Nokia Open Source Program Office, and my colleague here is... Yeah, I'm Jan Jorell, I'm a student at Alta University, and I worked this summer as a summer trainee at the Nokia Open Source Program Office also. And in this talk we will discuss a bit of a taxonomy of the open source compliance check, because we think that it's very important that we have a well defined set of actions, what we are doing in the compliance check, and we will cover a bit which tools are covering which part of this process. But let's start from the really beginning. So what is the target of open source compliance check? The target of open source compliance check is to follow the license obligations of the used open source software in a product, which means that we have to know what is the list of open source software used in the product. We have to know what is the license of these open source software components, and we have to know what are the different obligations in these licenses, and we have to act according to them. So for this, the first step is to get the list of open source software, what is used in the product, which is creating the so-called bill of material. And the very difficult part of this is to get all the dependencies and all the dependencies of dependencies. I will discuss this a bit later. And then there is a second step, which is basically a decision to decide if the usage of the open source component in the given product is okay or not. Different companies have different policies on the usage of open source based on their internal business processes, but still a decision has to be made if the open source component can be used or not. And then the third check is to fulfill the obligations of the license or licenses. So let's see these in detail. So the first step is to get the list of the used open source components and their licenses. And this is a very complex step to create this full so-called bill of material. And I collected all the steps which are needed for let's say a full analysis, and for each of them I describe a bit what do I mean on them. So the first step is do a container content resolution, which means that if we are using containers and container images, we need to unpack the content of these containers. So we are able to analyze the content of the container images, which means that we have to get access to all files stored in the container images. There is an interesting fact about container images that they are using these layers in the file system. So in theory it's possible to have hidden content in the container images and some tools are discovering these hidden files, some tools do not discover these hidden files. But still we need this step to unpack the containers. And then we need to resolve the dependencies and this can happen in two levels. There is a level what I call the operating system level package dependency, which means that we use the tools what the operating systems are using. So we are using RPMs, EPT, EU, whatever the operating system has. So these tools to discover what packages are installed to the product, what we would like to ship. The other layer of the dependency resolving is the technology level package dependency resolving, which means that we use language specific tools like P or NPM or GoDep or something like that to discover what kind of dependencies the given source code what we are producing are using. So for example in case of Golang, this is the discovery of all the included Go modules and all the transient dependencies of all these Go modules. But still until this step we just or only have the list of the open source components, we do not have the source code of these or we do not have any information about the license or the copyright information of these. So we need to get the source code to run a full analysis. And for this we need the source code downloader, which basically downloads based on this information get from these different sources, it downloads the source code of the open source components. And then there is a need to scan for the copyrights and license information. So it's a tool which goes through the source code and the text licenses and copyrights information. This step can be bypassed by a different step, which is the online license checking. So there are or actually there is a database called clearly defined, which is accessible by everybody online. And this contains the license and copyright information of different open source components. So instead of scanning, it is possible to get these information from clearly defined. And there is also another optional step, which is the binary analyzer. So it can happen that we do not have access to the source code of the different components in the product. So in this case, binary analysis is needed when we analyze the binary artifact of a build process. And there's a last step that is a need for what is called forensic school analysis, which means that we scan the source code again for different code snipplets copied from other places, like from other or from any other open source projects or from stack overflow places like that. So all of these steps are needed to get the full list of open source components, their licenses and their copyright information. But then in step two, which is a bit more simple, we need to make a decision if the usage of the open source project is okay or not. And for this, we need basically two steps. So first of all, we need the software structural analyzer, which means that we need to be able to get information about how the open source software is communicating with other parts of the software. Because some licenses have different obligations for open source software statically linked to other software or dynamically linked. So that depends on these architectural decisions. And we need to analyze that structure. And then we need basically a policy engine, which makes a decision based on all of this information. And then the next and last step is to fulfill the obligations. So for these, we need basically two things. We need an obligation database, which means that we need information about what license have what obligations and practically how to fulfill these. And then we need some kind of a compliance bundle, which fulfills the obligations. This can be like, I don't know, list of the copyright holders in the documentation or including the license tax documentation or packaging code for distribution. These kinds of things. All of these steps can happen in different phases and different places in a build pipeline. So in the first case, this is happening in a totally distributed way. So each of the builds execute these steps in the scope of the specific build. And I represented compliance and legal experts here with this small figure. These information is injected somehow to the software artifacts as part of the code tree or something like that, which is like in the context of the given build pipeline. And every step is executed there and all the decisions are made based on these policies, which are basically part of the source code. The other architecture possibility is to do everything in a centralized manner, meaning that there is a centralized set of tools, which execute all of these steps. And the build pipeline just feeds information to these centralized tools and get basically a decision and the compliance bundle at the end of the handling of the compliance. And the legal and compliance persons are interacting with these centralized tools. And that is, of course, a hybrid way when some parts of this process like the compositionalizes is done as a part of the pipeline and other ways are done like the decision making and the bundle generation are done in the centralized tools. And in Nokia, we are using this third hybrid solution, so all products are doing the compositionalizes themselves based on the specific technologies that they are using, and they are uploading all the data to a centralized tool, and legal and compliance colleagues are doing their work, interacting with the centralized tool and the centralized tool is providing a result to the build pipeline and the compliance bundle. So we are using this hybrid way. Of course, it's possible to do any combinations like having only the decision in a centralized place and moving down the bundle generation to the pipeline. It's up to the decision of each company. Okay, that was the part of the paperwork, and let's hear about the actual tools, which are covering these steps. Yeah, okay, so I will talk a bit about some different compliance tools that we tested mostly during the summer and a bit what the features they bring and if they're actually any good views. Okay, but before I talk about that, I want to briefly mention some things that we looked at when selecting which tools we were going to investigate. So firstly, we wanted them to be open source, and secondly, we wanted them to be installable within a reasonable amount of time, so maybe like a couple of days max. Okay, let's dive into the tools. So the first tool that we tested was the OSS review toolkit or ORT for short, and it's an open source tool developed by HEAR Technologies, and it's a toolkit consisting of different sub-tools. So it has an analyzer tool, which can do this dependency analysis and based on source code find all the dependencies, and it works by using different technology specific package managers. So for example, if it's a Node.js project, it will use NPM to figure out all the dependencies. And yeah, then it also has a downloader tool, which uses the analyzer results to download the source code for each of the dependencies. And it's quite sophisticated, it can use also different version control tools like it to clone specific like versions of the software. And then there's the scanner tool, which does this static scanning of all the source code, and for this by default it's using the scan code toolkit, which is another static license and copyright analyzer. And so yeah, it uses that to scan all of the components and figure out or find the licenses and copyright notices. And after this, there's an evaluator tool, which can look at the scan results and apply some custom rules to basically decide if it's okay to use the software. So it has to be implemented in a Kotlin based domain specific language, so you can write the scripts that specify these rules. So for example, you could disallow using some copy left licenses or something like that. There's the reporter tool, which can generate reports in good looking human readable formats, for example HTML and PDF and so on. And yeah, each of these tools like depend on the previous one for input files, so they can be running this kind of pipeline way. And yeah, I think I didn't mention it's a CLI tool, so it can be running in scripts or on some CI CD services or whatever. Okay, up next is Turn, which is a tool developed by VMware, which is focused on container analysis and finding the bill of materials for container images. And it works, it's like a dynamic analysis, so it works by mounting the different layers of the image and then running some shell scripts inside of them to find out the installed packages. And yeah, it can also use as extensions this scan code toolkit for static analysis of all the files in the container and also this CVE binary tool for vulnerability scanning, so it can detect if there's some package that has known vulnerability. And yeah, it generates reports in different formats like HTML and also SPDX. Okay, the next tool is licensed, it's a tool developed by GitHub and it's quite similar to the ORT in its features, so it can do dependency analysis based on source code. And also using these different technology specific package managers. But it can't download the source code, so but instead it has a feature to fetch the licenses for each dependency. And based on that, you can write in a config file a list of allowed licenses and so it can check against that list if there are any licenses that are not allowed and then it can generate some reports for each component of the project in this notice format. Okay, up next is this FOSA CLI, which is very similar to licensed I think in its features. So it can also do in a similar way this dependency analysis, but unfortunately that's the only feature that or the rest of the features require you to set up an API key with FOSA.com. So you have to sign up to their services. So yeah, the dependency analysis part is like the only part that can be run without the setup, but it has features to check for also license policy violations on FOSA.com and also generate reports and things like that. Okay, up next is FOSOLOGY, which is a Linux foundation project, which it's mainly focused on scanning and reviewing, so it can't do this dependency analysis, but you can upload software to their database using this web UI. And it has two different license and copyright scanners, which try to complement each other. And then based on the results you can do some manual clearing or reviewing, so you can look at all the licenses found and even correct if there's something that you think is wrong. And yeah, so it's a web UI, but the web UI looks a bit outdated in my opinion. And finally it can generate reports in different formats. Okay, up next is software 360 antenna, which is a tool developed by Eclipse. And it can do dependency analysis using this Maven dependency tree. Or firstly, I can mention that it can be like installed as a Maven or a Gradle plugin. So if you're using Maven or Gradle, you can use it as a plugin, so that's quite neat. You can include it like in a build step. Okay, but anyway, so it can do dependency analysis using this Maven dependency tree analyzer, but that's primarily for Java projects. So if it's another type of project, you can feed it ORT analyzer results instead. And it can do download source code and licenses using this Maven artifact resolver. But again, that's primarily for Java projects. But if it's another type of project, you can use the ORT downloader. It's like included, so you can use that to download the source code. And it can check for forbidden licenses also as configured in your config file. And it can generate reports in different formats like most of the other tools. But it can also be integrated into the Eclipse software 360 platform, which is an open source software catalog app. So you can view later your results there in like a web interface and things like that. Okay. Yeah, the last tool that we looked at was this Go Licenses tool, which is a tool developed by Google, which is it can detect dependencies for Go projects. But no, we noticed that it doesn't detect versions, so that's a bit of a downside. And it has a feature to collect all the artifacts needed for license compliance. So it can figure out the licenses of all the dependencies. And based on those, it can collect the license text and copyright notices and even source code for licenses that require it. And collect all of that into a single folder. So that's quite neat. And it can also check for forbidden licenses according to Google's license classifier and generate CSV reports. Okay, finally, I've made this comparison table to kind of compare all of the tools. So up top we have the tools and some steps in compliance. So yeah, from this we can see that there's no tool that can perform all of these steps perfectly. And some are of course overlapping in their features, but yeah. And with this, we reached the end of our presentation. Thank you for listening to us. And if you have any questions, please use the conference platform to contact to us or you can just use your or contact information. With this we would like to thank you and have a nice conference.