 Hello, my name is Krzysztof Opaszek. Hello, my name is Aleksandr Mazuruk. And today we'd like to share with you our story about our bumpy ride around container license compliance. So the schedule for today, first of all we would like to introduce the environment in which we are dealing with this problem, what we have tried, what didn't work, what we have now and where are we going in the nearest future. So first of all, a disclaimer, neither of us is a lawyer. If you need a comprehensive guideline to license compliance in Docker or in general, please ask your lawyer to provide you one. This presentation cannot be viewed as such a guideline. So first of all the problem statement, usually when people talk about the license compliance, they are thinking about corporate environments. But here we are dealing with the license compliance in the open source project called OWNAP. You've probably heard about it as it's a center of gravity for 5G network. But for the purpose of this talk, we are not going to focus on the functional side of OWNAP, but rather the technical side. So OWNAP from the perspective of the license compliance are a few hundred Git repos in Java, Python and a few other languages. They build into over a hundred Docker containers which are hosted on our own Nexus instance that is publicly available. And OWNAP can be deployed using single click deployment thanks to help. So what was in place when we started our journey? So first of all, we had an automated Docker release process that was prepared by LFIT and that worked fine for most of the PTLs. So we need to not destroy that one. All the heavy lifting around this process and the unit testing in general is being done by our Jenkins instance. We also use Sonar to find some issues in our source code. And we have also periodic license of our Git repos to catch all the license issues that appear in our source code. Finally, we have also the base images provided by the integration team. Unfortunately, as the initial research show, they've been not used by many of our subprojects. So how did it all started? I just mentioned a few seconds ago that we had a periodic scans of our source code to look for license issues. And after one of those scans, one of the few community members started asking a few questions to Steve Winslow from the Linux Foundation who provided a great support regarding the license compliance for us. So the questions were, hey, we are putting so much effort to make sure that our source code is clean. But really, number of our users do not even see our source code because they are deploying our Docker containers. So how about them? How about all the databases that we deploy as a part of our app? And few other questions which resulted in a quite a long discussion on that mail release. So final clarification came from the article that was shared with us by Steve Winslow. That's a really great position that I recommend anyone who's dealing with Docker containers to read. So if you are unable to read it now or in the future, here is the most important outcome from the article. So first of all, if you are making something publicly available, it means that you're distributing it. The same applies to source code and the same applies to Docker containers. So depending on what you distribute, the compliance process has different complexity. If you are distributing source code, you need to make sure that it is clean. If you are distributing a Docker file, then it's treated the same way as source code. So you need to make sure that you are, for example, not copy pasting fragments that are coming from incompatible licenses. But if you are distributing a Docker image, then it means that you are responsible not only for the stuff that you added to the image, but for everything that is inside. It includes also the base image that you use to produce your final Docker. So you need to ensure license compliance, not for only for your code, for your binaries that you put in the image, but for all the packages that are there. And you need to make, and you need to ensure that for each and every layer of the image. So even if you remove some packages in one layer and they are present in the previous one, you still need to make sure that you have a license compliance for them because every layer is available to the end user. So at that point, we realize that we really need an automated license compliance. But even before we start our work on automating that, we really need to think how to shrink our images. As I mentioned, we have more than 100 Docker images, and number of them contain a few hundred packages. And many of those were also GPL v3, which community decided to remove from all that because it's not being loved by multiple companies. So we really need to shrink our images, reduce the number of packages that we have, that we have to care about, and we need to automate as much as possible. We need to ensure that the compliance is being done really for every layer that we have, not only for the top one. Unfortunately, we faced a few challenges. The most important is that Docker in general, Docker images obviously are not really friendly for any kind of introspection. So it's really often that we need to execute additional steps in order to collect all the necessary data about content of our dockers. So what was missing, what we needed at that point? First of all, we need to really know what we have inside our dockers. So we need to generate the software bill of materials with all the data required to perform compliance report. We need also policies in order to be able to monitor different categories of licenses because some licenses are compatible with each other. Some are not, some are prohibited by the community, some are blessed. So we also need to integrate that with existing CIs. We don't want yet another system because we already have to, that people need to care about. So we want to make sure that the amount of work that we create for the community is minimal. We also need to integrate that with our existing image release process because we want to provide the feedback for the developer even before the image is available in Nexus. Some other considerations that we had in our design phase is that we want to really automate as much as possible. As I mentioned, ONAP is not about license compliance. It's about orchestrating DNF, CNFs and all the other stuff. So we prefer also the open source solution. ONAP is open source. We also want to, our tooling to be open source just to make sure that we are able to extend the mechanisms that we are using. As Krzysztof has mentioned, first of all, we need an SPUM. So I went to look for a tool that would generate such. We found Tern, which could analyze Docker containers out of the box. It is a CLI app that takes a container image or a Docker file and generates an SPUM for it. It's pluggable and has upstream support for ScanCode Toolkit. On itself, it supports only packages, but ScanCode Toolkit can be used to extend both package metadata and to scan files. Tern on itself will be as precise as metadata that are reported by package managers, which it queries. Tern gave us the ability to easily show people what's in the containers they build and what licenses the components in those images have, which is why we integrated Tern in the weekly CI chain with more than we shown. But we wanted to provide feedback sooner. The best case scenario would be inherit, which Tern couldn't give us. So we want to give developers feedback as soon as possible. So we need a tool that we can run on public CI chains. And Tern works by mounting layer by layer by overlay FS, treating and querying package managers for info on licenses and copyrights. This means that someone could push something that we don't want to execute it in such environment, must get a package manager, and possibly infect our machines. Tern also requires Docker stock access and some extra privileges on older kernels without overlay FS support in user space. Overlay FS and Docker did fail quite often. So each week we had a bunch of doctors that failed to scan. Tern might still be okay for people where they develop closer solutions or have a restricted push access. But it didn't work for us, so we had to switch to another tool that emerged in the meantime. So what we are switching to the tool of our choice is Skanko IO, which is a Django based wrapper around Skanko toolkit extending and specializing its features. Obviously it has REST API in Web GUI being Django based, but there's also CLI. Another factor that prompted us to switch is that next be authors of the tool and Skanko toolkit have a lot of supporting libraries for software composition analysis and related stuff like package code for handling stuff related to packages or fetch code for you get the idea. Skanko IO is also easily extendable and customizable due to usage of concepts of pipe and pipelines. And we did need some extensions especially given the challenges that the Alpine Linux has given us. Alpine Linux has really small footprint while having typical OS stuff like package manager so you can just do APK add all your dependencies and run whatever software you need there. And the cherry on top of that was that the base image is GPLv33 as Alpine uses Moosl and PZbox. But it's not okay as the Alpine packages don't have sufficient information needed for license compliance. For example, there is no copyright info which is required by almost every open source license. We only get license identifiers and those are easy to get wrong like with PST, which was PST zero close one close. It had some iterations. There are also projects that take a license and add something to it. And it might be marked as the original license without the additional stipulation that might be important in terms of license compliance. Luckily, it doesn't mean that there is no way to gather that info, which is what we set out to do. So what we do is we download the Alpine's upwards repo which holds big recipes for all the packages, check it out on the content related to the package version in question, parse the build recipe, download source code, analyze it with scan code toolkit to get all the missing information. This is not a perfect approach as there are many sub packages in Alpine. Sub packages share code repo and version with their parent packages, but use only a subset of what's in that code. This means that if the repository includes some GPLv3 code, it will be marked as including GPLv3. Until we add support for parsing makes, cnakes, and limiting the source code scan, that's what we get. Nevertheless, it significantly improves Alpine's scanning results and it would not be possible to vote. Philip Om-Hedam and Mateusz Peretz. Thanks guys. So what are we doing with this? We still are missing a few features that we need in Onup. The first thing is a way to visualize inter-docker dependencies. We have so many dockers in Onup and it's hard to control what baseline images are used. In Onup integration, we are maintaining two GPLv3 base images with Java 11 and Python 3. We should try to popularize. I've been checking from time to time the status of utilization of those images using doggies, which creates a dependency dot graph of all layers in images used by a doggie instance. Those graphs are hard to read and are huge as you can see this bar at the bottom of this slide. This is a limited graph of Onup dockers. So we are currently developing additional pipeline for scan code IO that would do what doggies does, but without dockers of access and present it in a more readable manner, possibly in web GUI. Another thing that we want to add is support for waivers which would complement policies that scan code has. Policies currently support only code-based resources and allow classifying licenses approved, restricted or prohibited. We will need for policies to also support packages and we will need support for waivers. In Onup, we keep waivers in a key controlled repo that we would like to check against and mark the projects with waivers that use a restricted license as OK because they do have a waiver for that. Another thing is Gary integration. As I mentioned, the automated docker release process is happening via submitting a change to Gary. So this is really the place where we have a developer attention, so it's also the best place to provide feedback regarding all the license issues that can be found in the image that he or she is currently creating. So to achieve that, we want to integrate scan code IO and trigger it from Jenkins from the job that is actually being executed for every new docker release. Another thing is the generation of the final compliance records. So when we know what's really inside our dockers, we want to collect all the sources in places where we need to make sure that the source code for packages is available. The license requires that and you like to create a proper compliance report which can be used by people to present that to their legal departments or to see what's really inside our Onup and which pieces of software are being actually included into our package. So just to sum up our presentations and the lessons that we learned during our work. So first of all, we believe that choosing the right distribution for your containers is really crucial for the license compliance. And that's because if you are using a full distribution image like Ubuntu or Davion or anything like that, you may end up having to execute the compliance process for hundreds of packages that are there. And that's a huge piece of unnecessary work that has to be performed there. And then it's also a great idea to share the base image among your sub-projects that can really narrow the list of packages that you are dealing with and make sure that everyone is up to date with the current security fixes. From our perspective, there seems to be no silver bullet, at least for now in terms of Docker compliance process, even the commercial tools that are there on the market, they really do less than you would expect them to do. And obviously we are not going to name here any tools just to not make good or bad PR. We are open source people, so we are always looking for people with similar interests to collaborate and to create open source tooling for Docker compliance. So now the question for you, do you have Docker compliance process in your project provided that you're shipping Docker containers? If not, then maybe it's a good opportunity to get in contact and start collaborating so that we can share the workload and make sure that we may all benefit from the same solution in the community. Thank you. Thank you.