 I will get started then. So this is my talk. Thank you for sticking around to the last afternoon and coming to my talk. I'm only talking about license compliance in Embedded Linux. We'll focus on Yoktu Project. First of all, just a bit of an introduction. A little bit about me. I've been involved in Yoktu Project since around 2013. I do work across the whole embedded stack. So touch, U Boot, Kernel, everything, but often end up involved in release process discussions as well. That's why I'm talking about licensed compliance and I do consulting through B to 5 Limited, there's some contact details there. I just want to start off, as we're talking about licensed compliance with a little bit of a disclaimer for this talk not providing any legal advice on this at all. What I'm talking about is the best practice based on my experience as a developer, my experience as a member of open sourced If you've got any doubts about this stuff, consult an appropriate lawyer. By appropriate I mean someone who knows your jurisdiction and somebody who knows their way around open source licensing. So that said, what am I going to cover? I'm going to start off going through some fairly generic best practices which aren't very Octo project specific. Then I'm going to dive into the available tools that we've got in the Octo project. Look at some ongoing future work on that and do finish up with some comparison with other projects and point out some other relevant projects you might want to take a look at. There's a couple of things I'm not going to have time to cover. Not going to cover any of the DRM or TV organization concerns and not really going to cover how to interpret licenses and decide what you need to do. I'm going to focus on the tools and the practices. Let's start off with why should you care about this? Typically if you're selling an embedded device and distributing a physical device that has open source on it, that is an act of distribution. There is a risk of legal action here if you're not following license compliance properly. Doing things right gets you standing in the community and makes it easier for you to get the help and support you need from people. Another reason why you should be caring about license compliance is you really should be retaining full sources anyway, even for things where you don't need to distribute those sources to your customers. Things do disappear off the internet. You need to be able to step back to an old release of your product and rebuild with minor changes. You need to make sure that you are archiving all the source code that you're using even if it's things that you don't necessarily have to distribute copies of. Let's dive into this general advice and say this section is not really a Yokto project specific, so pretty applicable anywhere, really. The fundamentals of license compliance, really, is two things. You need to provide license text and notices, and you need to provide complete corresponding source code for things that are covered under copy left licenses. Looking at these, you end up with some questions. Where do we put this? Do we put the license text on the device? Do we put the license text in the documentation on a website? Do we publish the source code directly to our customers, or do we give them an offer letter and let them come and ask us when they want the software? Pretty much all of these are valid options. I would say personally I'd say avoid the last one, avoid using an offer letter. It's better off to actually publish this stuff to your customers straight away. Everything else, really, is a valid option. The thing I want to highlight early on is this thing I'm going to call the distributed image. Sounds pretty simple, but this is the image that you are actually distributing to your customers. This may not be the image that you build initially. You need to think about when you ship a device to a customer, what is the actual software that is on that device when it goes out the door. If you're uploading software to a website where people can download it, you need to think, well, what is actually in the file that the customer downloads? When I'm talking customer here, I'm not even just meaning paying customer. I mean anyone who gets this stuff off you. Think about what are they actually getting. This, as we'll talk about later, this may be slightly different to what you're initially building. The first bit of advice I'm going to give is you really want to get to a point where you've got a single command to build your final image. This might not sound like licensing advice, but this makes it much, much simpler to audit what's going into your image, to reproduce what's going into your image, to not accidentally miss out the capture of sources in your build process and end up with an accidental failure of license compliance. So, yeah, a single top-level script that you run that kicks out your final image is also really important for licensing, as well as other things. And when you start to put this script together, you will probably find your build process is non-trivial. There will probably be a few moving parts here. And like any piece of non-trivial software, this needs tests. So I always recommend test your releases. Test what actually gets uploaded to the website. If you expect a certain image to come out with maybe a manifest next to it, test that the manifest appears. If you expect something to be in slash Etsy in the final image, open up the tab or see if it is there. If you archive a load of sources, check you can rebuild from them and automate these tests as well. That's going to make life a lot easier. The other thing I'll say is use the build system as far as you can, whether that's Yokto project, build route or anything else. Try to avoid writing a post-build script that unpacks your image, throws more software in it and repacks it. I've seen that done quite a few times. And when that happens, you lose access to the tools that are in Yokto project and build route to do this license compliance and auditing. You lose access to those in those modifications you've made with the post-build script. It's very easy to inject additional open source license content in this post-build script and not have it captured when you're grabbing all these things for license compliance. You can take an image you've built, move it, copy it, compress it, whatever you need to do, but I would say don't try and inject new content at this stage. Another form of post-build script is what I call factory test, but this is the process of programming an image into your device. You may also be programming some per-device configurations, some calibration data, whatever else. You might be running a test suite on your device to check that it's been manufactured correctly. I would say don't now start adding in new test software on the device. Don't start doing on-device package management in this process between the first programming and when it goes out in the door, because, again, you can end up installing some new software that may be open source licensed that then isn't captured in your main license compliance workflow. License compliance doesn't just mean making sure you've got all the license text and making sure you've captured all the sources for things that are copy left. It also means not accidentally distributing proprietary, the sources for proprietary software. You need some filter in here. You don't just want to grab every bit of source code that's gone into your image. You're probably going to need some filter in to pick out if you do have proprietary components. When you write in those test cases to make sure that, hey, we have grabbed this source code, make sure you haven't accidentally grabbed the source code for the proprietary component, write a test for it. I find it useful to have separate images here. If there's some proprietary application being installed on an image, you find it's really useful to have another image recipe next to that that produces a pure open source image that doesn't have those components so that we can do this audit in a comparison. Fairly obvious one here. You need to capture patches to the source code. If you're pulling something like Busybox Down and you end up patching this thing, you need to capture your patches as well for things that are copy left license. Watch out for hidden patches. These are things that aren't called a .patch file but also modify the source code. If you've got a script or a function that runs as part of your build that calls out to sed and does some transformation on some source files, that's just a patch under a different name. You also need to make sure you've got a copy of that in your license compliance archives. Another one is make sure the system records the patch order. This is handled by, build-root handled by Yocto project but if you're using anything else, make sure you don't just give someone a directory of 30 patches. Make sure you actually do say what order they get applied in. One here that has multiple different types of interpretation. For GPL version 2, it does say you should include the scripts used to control compilation and installation of the software. You could interpret that to say if you're building with Yocto project, your recipe for that application could be seen to be the scripts used to control the compilation. There are different interpretations here. I'm not a lawyer, not giving legal advice but this may be an area you want to look into and make sure you're doing things right here. Then I want to talk about a couple of pitfalls that people can fall into in terms of license compliance. One of these is just using a desktop or server distro installing Debian on a device and then shipping that to your customer. I'd say just say no to this. It's very difficult toward it what you've put into these images. It's difficult to provide all the required source code and you can easily end up with disparate versions of things if you run updates at different times. These really aren't the best solutions for putting on a product that's being distributed. Another thing I see a lot that causes endless problems is Docker. If you write a Docker file to produce a container that's going to go on an embedded device and it's going to be shipped to someone, the Docker file is not the complete corresponding source code for that image. You really need to make sure you're capturing everything. Be careful when you've got that fromline at the start of a Docker file that says what the base image is. You might not even know exactly what's installed in that. There might be things that are copied in rather than installed with packaging. I would say watch out when using containers in embedded Linux. Use something that builds the container images with something like Yopto project or build route rather than Docker here. Another pitfall where I see things go wrong is when using pre-compiled tool chains. You might be using the ARM tool chain or the older Lennaro tool chain which is pre-compiled to run on the host cross-compiled for your target. This will typically include GCC. That's only going to run on the host, but it will probably also include a C library. That's going to get installed on your target. You do need to capture the source code for this as well. That might not be an automated process. In the Yopto project case, if you pull in the metal Lennaro layer, enable the external Lennaro tool chain and then run the archiver to get all the source code, it's not going to get the source code for the external tool chain to my knowledge. This is something where you might need to go and do a manual download here. You may want to test out that it's actually capturing everything you need. Another thing that causes problems in which specific package managers, NPM, cargo, all these other things, I find that these are awful for license compliance. It's often not even clear what source code corresponds to the binaries that are upon the whatever website it gets from. These things may not support offline compilation. They may not support separate out dependencies very well. They may not support getting the license text and source for all those dependencies. One thing I've found with cargo specifically is there is no guaranteed map in between the version tag on crates.io and the actual tagged version in the Git repository. So you may not even know what version of the relevant source you need. It may not be an easy command to just say, well, download me this source code. This is an area you need to do a little bit of research, especially if you're writing applications in languages that use these tools. Do a bit of research, find out what the best thing is here. There's too much here for me to cover in half an hour, 35 minutes, about these package managers and license compliance. The other thing I will say is just other insane things that I've seen. Pretty much every stupid thing you can think about, somebody's make file somewhere does it. Even I think the worst case to have seen was something that had a manual page written in markdown and we want to convert that to a manual page. So what do we do? Well, we could use various tools to do that or we could just find someone who's got a web app running on Heroku that does that and in the make file we upload the markdown file and download the result using Curl. Brilliant. Yeah, how do you know that's doing what you think it's doing? How do you build that offline? How do you build that three years later when that apps disappeared? For a manual page, maybe not that important, but if this is a bigger part of the stack, that can be a problem and it can cause problems with license compliance as well. You may find make files that go away and download additional sources and they may use online tools as I say. So be careful with this, watch out. I advise doing completely offline builds to check this. Literally unplug the network cable from the machine and check you can still build and that's probably the only way to find these things. The last thing is metadata bugs. Metadata bugs do happen. The license metadata is just a field in a recipe file that says, what are the things under GPL version to or later? Sometimes that can be incorrect, sometimes that can be incomplete. The best advice I have is follow stable updates. If you're using a particular version of the Octo project, the one I know, we have stable release branches if we find incorrect license information that will get fixed in those stable branches. So follow these updates for major commercial projects where you're doing things on a larger scale. I would always recommend doing your own auditing here, do your own verification, use something like phosology. So that's where we are for the sort of generic advice that I think applies to wherever your build system is, you can apply those ideas. So now what I want to do is start to take a little bit of a deeper dive into what tooling we've got in the Octo project itself. So first of all, let's start with the simple stuff. The metadata that you find in a Octo project recipe. So license field is fairly straightforward. These days we're using SPDX license identifiers, which means things are standardized, which is a great improvement. There's this magic license files checksum that we have. What that's there for is to catch upstream changes in license so when you update from one version to the next. Upstream might have changed their licensing in some way. So by having a checksum of what the license file content is in the recipe, it means you don't miss these changes accidentally. And then you will get a warning that, hey, this has changed. You know you need to go and read it, see what the difference is and update it. A bit of advice on these recipes. We do have this magic closed as a license. I always say avoid this. If your software is distributed under a proprietary license, give it a name, include the license text in the layer you've got. This closed keyword disables that license file checksum verification. So you might think, hey, if it's closed, it's coming from within my organization, it's not going to change without me knowing. In a large enough organization, yes, other teams can change the license for the tool and something that was previously proprietary may now be released open source. Something that was released under a permissive open source license may now acquire proprietary changes and have a different license. You do want verification, even if this isn't coming from a third party. The other thing we have is for version control repositories, we have a source ref, which says what commit hash or version we pull from the Git repository, Mercura repository, whatever it is. We have this lovely thing for development called Autoref where you give it a branch and it will just pull the latest commit from that. Make sure you don't leave that on for releases. Again, if somebody pushes something five minutes before you do your release, you might end up building the wrong version. If you give this to somebody else three years later when they try and rebuild it, it will be pointing at a different hash. I always say avoid Autoref for releases. It's too easy to end up with a mismatch where you think you've audited something and then it's pulled a newer version and rebuilding the image later might give a different result. When I say give your license a name and include the text, there are a couple of ways that this is supported within the OXO project. The simplest one is for common licenses that apply across multiple recipes. We've got this license path variable. This is just a space-separated list of directories to search for these license files. Every layer can extend this. Just do license path plus equals something. Add a new directory. Put your license text in a file in that directory. Now it will actually be picked up as part of this. I would always recommend using this instead of closed or proprietary as a thing. It just makes it easier to track what's going on easier toward it. The other thing here is if you've got a unique license, this is a license that only applies to one recipe. It might not be worth grabbing that license text and putting it in the layer. We have this no-generic license variable that allows you to say, well, this just applies to this one package, so just go and grab the license text out of the source tree. If you call your license blah in this case, you can say no-generic license for blah is this file name. It's just a file name within your source tree. It might just be the license.md or something like that. This would be in a particular recipe. I would say use this rather than ignore in the warnings that say no-generic license text found for blah. I would give it something like this so that it can go and grab that text. Again, it just makes it a bit easier to find that text later and to audit it and to see what's going on. Say we've done a build and we want to actually do the first thing we need to do for license compliance, which is to capture what license text applies to the various components in an image. There is this temp slash deploy slash license directory which gets a manifest dropped in it and it gets all the various different license text dropped into that. You can copy that, you can tar it up. That is a good first step. I say do this after a clean build. What I mean by a clean build is if you've got two images and you build one then the other, the license text for everything in both of those images is going to end up in that license's directory. If you are just going to copy that whole directory tree, you want to make sure you've not accidentally built something else that's irrelevant first. This is just a directory tree full of one directory per package with some license text. You might need to do a bit of manual post processing on this afterwards. If you want to actually include the license text in an image so that you know when I give somebody my device, it's got the license text on it somewhere, you can set these variables in a distro comp for a local comp to say we'll copy and manifest onto the target, copy these license directories with all the license text onto the target. This will place all the relevant license files into this common license's directory within your final image. What I recommend is a slightly different approach, which is this license create package. This means for every package that is built, it's going to create a dash LIC package with the license text in it. This package will have some files in user cell licenses, and this package will be recommended by the base package. When you install Busybox, it will recommend Busybox dash LIC, and that will be installed into your image. What this does is it provides an upgrade path. If you use an on-device package management, this is the way to go because it means when you upgrade from one version to the next, if the license has changed, you'll get the new version of a license text. This approach is fine if you're not using on-device package management. If you use this approach though, when you upgrade a package, if the license text has changed, you won't get the new license text with this approach you will. If we move on to capturing the source code for copy left things, where you've got to distribute the full corresponding source code for the binaries that you provide. There's two approaches here within the OCTO project. One is we've got a download directory where all these sources that are pulled by the recipes as part of the build process are placed. You can ship that directory. That has the sources in it. Great. We also have an archiver class which provides a more flexible way of doing this. The archiver has filtered. You can filter by the license. You can filter by what's called the recipe type. This is... Is it a thing that's going on the target? Is it a part of the cross compiler? Is it a native package? This is configurable to fit your requirements. If you want to ship the downloads directory, the first thing you want to do is enable this BB Generate Mirror Tarballs. That says when we pull a git repository, tar up that bear repository, create a tar.gz file that can then be used for mirroring. Then you build an image. As I say, clean builds, you're not getting the downloads for unrelated things. Then copy the whole directory or tar it up. You can ignore the .done files that are placed by BitBake. You can ignore the version control subdirectories if you've got these mirror tarballs generated. That's fine, but as I say, you can end up... There's no filtering on this. There's no way to exclude things from that download directory. We'll come to the archive in a minute. One thing that's really useful is generating shallow mirror tarballs. If you've got a big repository, like a Linux kernel, the version you're actually using might be at an 80 or 100 meg tar.gz at the end of the day, but if you pull the full git repository history, that might be two gig. You can set this git shallow and generate shallow tarballs, and that will generate mirror tarballs that have just the commit that you're actually using. On a recent project, this got our mirror size down from seven and a half gig to just one gig. This can be quite nice to save wasted space. If we look at the archiver instead, the archiver has a few benefits over just grabbing the download directory. The archiver has multiple modes. It can grab the original source files. This is pretty much as it is unpacked. It can grab the patched version. This is a tarball with all the patches already applied, so you don't need to worry about them separately. There's a third mode, which is after you've run configure, which I don't really see the use for myself, and there are other options. You can create a single diff.gz between the original source and the patched source. That wraps all your patches into one, and there is an option to capture the recipe files using the archiver. That one I personally would say avoid because it doesn't follow, it's not recursive. If you include some other .inc file and that .inc file includes another .inc file, you're going to grab all the way down that hierarchy, everything that gets pulled into the recipes. So you could end up with an incomplete recipe file. What the archiver has that's really useful is it's called copy left filtering, but really it's just license filtering, and this has an include and an exclude. I'm not sure what happens when they overlap. I didn't get a chance to look into that, but generally they probably won't. By default, the archiver is going to grab all the source code for things that are GPL, LGPL, AGPL based. You can extend that with other things as well. Personally, I throw in all pretty much anything that's open source licensed even if it's under an MIT or BSD license, I might not have to distribute the source for it, but I think it's a nice thing to do. It makes life easier for customers, so I extend this most of the time. That copy left license exclude can be extended if you've given your proprietary license to think it's own license name. You can whack that license name in here, and then you know it's not going to get the source code for your proprietary application. It also is filtered by recipe type, so it defaults to target, which is basically when you build something for installation in your final image. But there's all these other recipe types as well. There's native, there's native SDK, there's cross, and a few other ones. I'll grab the source for, let's say, you build nestdk. You might want to extend this to capture some of those. So other things we want to do, providing layers, I think the best way to capture the recipes, the scripts that used to build things, the patches, everything else is just to release your whole Yocto project layer. And I would say publish as much of this as possible. Either as a tarball or preferably I would say some nice version control repository. If it is open source, you can add that to the layer index and open embedded so other people can make use of it. If you've got some recipes that build proprietary components, some recipes that build open source components, I would subdivide those out into separate layers. You've got one that's the open source stuff that you can easily release. When you provide the layers that watch out for changes in configuration, I've seen this a few times where some things have been added by just extending image install in a local.conf. So you can end up with modifications to the build process that aren't captured in your layer. They're just in bblayers.conf or local.conf. And there's two solutions here. You can version control local.conf so that it's part of a repository. Then you're not going to get accidental modifications that you know you've got it. The other thing is you can write a bit of script that captures that as part of your build process. And as I say, you might modify a source URI for a package in your local.conf. You might modify what gets installed into an image. It's worthwhile capturing these files just to make sure you don't have these accidentally. I'm going to skip over a couple of these. There's more slides than I have time for, but the slides are attached on Shed so you can download them. They're just all the bits and pieces we have within Yocto project to control licensing. And then I'm just going to move on to talk about recent improvements and work in progress. So one thing that's been added recently is we've got this incompatible license flag which can be used to exclude things like GPLV3 license things if you want to. That can now be configured per image so you can have a development image that has these GPLV3 components in and a release image that doesn't if that's the way you want to go. Dev tool and recipe tool have got much improved license handling than they used to have. And in the last couple of releases of Yocto project we've fixed several metadata bugs where licenses were incomplete or incorrect. So the things I've been working on I've been working on adding a new mode to the archiver. So the archiver is great, it's got this copy left filtering but the output of it can't be fed back into a future bit bake build. So what I've been doing the reason for wanting to extend this is I want this to be testable. And I think the best way to test how we grabbed all the relevant source code is can we rebuild the whole image. And yes, so this requires that you grab things in a way that can be fed back into a future bit bake build. So this is probably going to end up in the 3.1 release of Yocto project. We missed the 3.0 cut off for this. And this supports creating either a directory per package or a single directory mirror with all the source code in there. And just uses the existing fetcher within bit bake to look at each source your eye entry say well what's the local file for that and grabs it up but it's like grabbing the download directory but it's got this copy left filtering you can apply to include or exclude things it uses s state and it won't accidentally pick up sources for things that went in an image you built previously. And we've got an extra filter on here as well. So if you're providing your layers you can say well just exclude any source with a file prefix at the front because that's going to be in the layers we provide anyway. And the other thing that's on my agenda is if you grab that license directory it's just a big directory of all the different license text that applies. What I'd like to have is just a single license info artifact per image that's in a neat format that's easy to just put on a website, put in documentation put on an image. So I'm going to finish up with a bit of a comparison with other projects and some pointers to other projects that may be useful here. So the other major build system that we see used is build route and build route's got a pretty good license compliance story. It's got this make legal dash info target you can run. It's documented. It's less configurable than what's in the Yocto project but it's still pretty good. It grabs the sources, the patches, the license text and on a per package basis you can set this package name underscore redistribute equals no on your thing for a proprietary component to have it excluded. So this is maybe a little less configurable but it's still there. Another thing I've seen used in products is openwrt. I've looked everywhere for this. I cannot find any documentation how to do license compliance if you're shipping an image built with openwrt. I think this really needs some improvement because people do use that to build products and I don't know how they're doing their license compliance. So the last thing I want to do is just shout out a couple of other projects that are in this space that are useful. The first of these is phosology. This is a way of running license scans so looking through the headers for C files and seeing well the top level license says GPLV2 but do you have a file buried somewhere that says GPLV3? It's got a web interface and a command line so you can do manual correction and auditing. Lawyers like this quite a lot. We've got integration with that for your project. Another one I just want to shout out is openchain. So openchain are focused on standards and providing advice for people to handle the software supply chain. So if you get something that's got open sourcing and your vendor didn't do their license compliance right, when you redistribute it you're not going to have much luck. So they want to look at the whole software supply chain and fix this to define a specification, training, curriculum and a conformance process for that. The last one I want to shout out is the software heritage foundation. So they focus on collecting and preserving software source code over a longer time. They grab individual source files and are searchable by the hash of the source file. They allow submission by a web interface or API and they're actually a project of Inria which is a French National Research Institute for Digital Sciences. So yeah, those three projects I would highly recommend to have a look at. Thank you very much for attending my talk. Not going any more time here but I'm going to be stood around afterwards so if you want to follow up have a chat with me or drop me an email. Thank you very much.