 Hi there, my name is Tony Ayudo. I'm an engineer at Google. I work on the Bazel open-source product Today I want to talk about ensuring OSS license compliance the easy way I'm not a lawyer. I'm a programmer who wanted to ship some OSS products Ship products containing OSS code in a way that was compliant with their licenses So I'm going to talk about what we're doing The model we use for thinking about this that allows us to scale to a very large organization specifically Google's The examples come from Google's Version of this system. I'll talk a little about a bit about how we can use this mechanism to do things that are Compliance and audit related but not necessarily about licensing And finally how this is available in the OSS build tool Bazel And that last point is why I've got base Bazel logos on the slide rather than Google's This should all be available to you already is open source. It's not a proprietary thing to Google We can start by thinking about what OSS license compliance really is and it's straightforward I want to use other people's code and I want to respect the terms under which it's made available And respect the terms is two parts do the things that are explicitly required Don't do the things that are prohibited and the things that are required are usually easy to think about because they're often spelled out You have to for notice style licenses. You must provide a copy of the license text and copyright with your product It's very simple if you have a license you pay for by seat it's spelled out in the license use Pay them this much money depending on how many units you ship So they're easy to think about Don't do prohibited things is Rarely spelled out in the license, but what we have is the situation where certain things are difficult to comply with For example LG PL code has to be Distributed in a way that the end user can relink their application using their own build of the LG PL library Yeah, I can do that on desktop app. It is virtually impossible and unreasonable to try to do that in a phone app I Could do a bee compliant with Herculanean effort, but it just isn't going to happen So we think of that kind of thing as don't ship LG PL code on a phone So this is all pretty simple right you read the terms of the license you make sure you comply You get your best and your brightest together the engineers and the lawyers are in the same room they compare the terms of the license with the intent of the product and You know if it all works you ship it and We do this by throwing people at it and looking we enumerate all the OSS packages in our product You know if we're a startup we got 10 12 packages I can read all the licenses we do the things that are required mostly gathering the text of the licenses Don't do the things we shouldn't You know don't Take GP old code and put it in a proprietary product and distribute it to people. That's disallowed, right? Not not supposed to happen. Um And if I lend five or ten packages and applications in my company I could do this I could probably do it myself, and I'm not a lawyer But I'm assuming some of you listening today may work for organizations that are a little bigger than a startup You know when you have 10 20,000 programmers working on a thousand different products Using five thousand different OSS packages You're too big to put the right people together in a room for every product launch You have to automate Anything you can and you have to make sure that people are only working on the tasks where they add value To you know where we can't automate it, and they're the right person to do the human labor So let's talk about the roles involved there there are three roles in this dance And a small team they blur, but it really does help to think about them as separate entities We have product engineers. They understand what they're building. They're building a mobile app It has these characteristics. It's going to collect personal data or not but they They're close to the to the end user in this process There's another role though And I'm gonna call them the OSS engineer the OSS importer In a small organization, it's usually the same engineer in a large enough organization it might be different and the OSS engineer is responsible for Finding the package You know in the wild off GitHub or Maven bringing it into the organization making sure it works with your build system In our case assigning license attribution and Providing, you know frequent Pulse from upstream to get new features Now you notice the one role here is the OSS importer doesn't necessarily know where the code is going to be used for example, I For historical reasons I happen to be the maintainer of libcurl and lib USB at Google And the USB library is used in Dozens of different places. I can't enumerate them. I don't keep track about that I just am responsible for every once in a while updating things making sure it builds making sure it builds on all our platforms But I don't get to answer the question about are you using it in the kind of place you should That's really the job of the compliance team they can Evaluate whether the conditions attached to a license that are the restrictions and prohibitions match the intended Environment the product engineer wants to distribute it in So let's combine These roles with the model of how we think about licensing Also OSS code is made available under one or more kinds of license we use the word kind a lot and the kind has specific conditions and Products and artifacts that we build are deployed to various environments So it's the OSS importing engineer Who is the one who's closest to taking the code looking at the license? They'll work with a compliance team something to make sure that they've got the right attribution But they're the one saying oh this this library brought in is in a under the Apache to license and That's all that's their responsibility saying it is Apache to The meaning of Apache to well that belongs to the compliance team Because once you've said the kind of licenses Apache to There are specific conditions attached to that and the compliance team typically in this case for Apache to is going to say There's one condition that you need a notice of That note that you're using it So you have to include the license text include a copyright notice another aspect that the compliance team thinks about in Conjunction with a few senior engineers is the kind of places or the environments you can distribute an application to And they're not many there's There's phones there's desktops Maybe you want to distinguish between Mac and Windows desktops If you need it There's your servers in your data center servers and other people's data centers You might distinguish geographic regions if you want to use this kind of mechanism for Compliance with with others legal jurisdictions. We'll come back to that later But there are a few artifacts the order of 10 to 100 of them not tens of thousands and There's this Role that I mentioned before the product engineer they don't really have to think about this They just know they're building an iPhone app and the rest of the system we build around it understands how to look at licenses to Make sure that they can be put in the iPhone app And so we scale by making sure humans deal with it as few pieces that they need and that Domain specialists are focused on the things in their domain and not doing grunt work in the other domain along with this model we rely on some tools that Help us enforce the the limited amount of human intervention when we need it and and To help make sure that people don't end around the system Inadvertently or on purpose the first is you always need a robust and auditable source code control system You need to know that Or an engineer can't check something in without it showing up as a change You have to be able to lock down certain pieces to say that they require special review Maybe a particular team has to review them Maybe it's three reviewers have to review every piece of code in there But whatever you do you need this kind of compliance and you need that for other things besides licenses You need for socks compliance You need it just for your own sanity and safety If you Can't tell how something got in your source code control system. You've got far bigger problems than license compliance The second thing we rely on is a hermetic build tool and by hermetic I mean it can only pick up things that are in your revision control system Or potentially on a trusted server that you have populated with things from your revision control system It So we have that in basal and Google's internal fork blaze The build file specifies everything that goes into an object and the dependencies are listed and So on and so on down the tree and it always comes out to source that's checked in Yeah, if you were shipping a medical device you can't have it Include a library that happened to be on Alice's machine Because she was the release engineer you have to ship things that you've Vetted and audited and made sure that they're security compliant and we're in tested So any other kind of build system it Isn't going to give you That peace of mind that you know the provenance of every bit in the applications that you finally shipped so Let's see how this sort of looks this diagram is a little busy and we'll come back to it in detail And I'll go through it very quickly In basal parlance There are files that that have the rules for how we build things. They're called build They're named building all the time the upper blue corner box talks about an iPhone app And it's named angry hedgehogs It has some dependencies. I'm spelled angry wrong It's what the product engineer knows about they say I'm building an iPhone app That's all I have to know for license compliance And it has some dependencies and the little blue box below it And that may depend on more things and eventually it depends in on this example on a piece of code from at the Google's abseil library, which is available under an Apache to license. We call it license kind we've extracted out the copyright notice and We've pointed to the text of the license that license kind Points to code also in our repository we have a rule in system that Says the Apache to license kind has the notice condition attached to it and the compliance team also owns a test here that Takes in the combination of the application we're building The basal label in this case is the path to the build file colon The name of the target. So it's some service colon angry hedgehogs We know it is of an iPhone app That's the environment. It's going to be into the binary type and Because our build system Has a very clear sense of the dependency graph and everything that went into the target We can examine that to gather all the licenses used the license kinds They use and the union of the conditions that they use and and put that together With this target and the type and we can see if we win or not So this should be easy, but if the condition Reno requires end-user relinking for an LG PL library showed up. We would fail this compliance check Let's look at each piece slowly in a little more detail We declare licenses In in the package where we we build the code I don't really clear the license we Say that the code is available under an applicable license Most packages have this very similar boilerplate of the type we give a default for the entire set of Code in the library there. There might be multiple targets. Maybe we build the library in different ways We say the default is the rule called license We always have a license rule declaration named license it usually lists the license kinds In this case, we're saying it's an Apache to we extract out the copyright line and point to the license text file and Note that most of this can be done by the importing engineer The only place where they really have to consult anybody else is making sure that the text of the license Really is an Apache to license and they're pointing at the right thing We Can assist with tools like automatic license classifiers that look at the License file and go. Ah, yes with 99 percent certainty. This is an Apache to license Here's the right license kind declaration to put on it But you know it has some extra words that we haven't categorized There's no getting around the human element here most OSes Providers really do try to give the right attribution To their code. They include license files that they try to do the right thing But some of them play with the code play with the text they change it They reform and if they had commas it changes the meaning you You have to audit this with a human That's where the compliance team comes in Fortunately, you don't have to do with that often right that we use our source code control system to help us out here We have met a you know metadata checks when you try to submit code and if We're changing the content of the license text file or changing any of the attributes of a license rule in a package Or the package default changes Then our Source code control system says I need an additional reviewer and it makes sure the compliance team is added and in practice This just doesn't happen because once people decide the license they want to make their code available under they don't change it a lot It's a rarity most of the time they just add features And if I want to import a new release of the library to get new features I can just do that and everything works great I Went too far The license kind declarations are Sitting in a code path owned by the compliance team under compliance rules license as pdx Basal has an alias capability so that I can say at rules license and have that point into my own code base and license kinds Tend to accumulate in a file in a file and that they're pretty simple They're a name Here we're using the SPS a spdx identifiers a list of conditions And there are a small number of conditions. These are typical ones notice requires relinkable disclose modifications Right you have to publish your changes to a library you made And we point to the canonical text Again, we link the source code control systems protections to the responsibilities of things this The declarations are in a place owned by the compliance team So random engineers can't make things up and the compliance team can Know that The only licenses we think we're using are ones that they have vetted and looked at Now a great part of the model here is We've got this spdx namespace. We're using spdx IDs The conditions though can be organization specific So I could conceivably look at a particular kind of license and go our organization Has to do something special with this for one reason or another And add a condition That then can be used by our own compliance tools For whatever purpose we envision So that's the important part of the model the name of the license doesn't imply anything about Where you can use it and how you can use it. It's the conditions we've attached to that name that Make the choice for us Um We go to the product engineers view. It's very simple for them. They don't really care anything about the licenses They must know they're building an Android app. They're angry hedgehogs Behind that view there's the things going on that we saw in the diagram their app really has a license check rule Bazel can descend of all the dependencies of the license and gap of the application and gather all the licenses There's tooling with them there to gather all the license texts together and order them and compress them So that can be a resource in the application That way you can have a screen that lists all the licenses and copyright notices And somewhere we have a target that is the angry hedgehogs license check that compares the application name we're building and iPhone and the union of all the conditions we've found to decide if we can ship or not and Alternately, we could put that test within code owned by the compliance team rather than alongside your application Either is supported by the model it just depends on where you want to build your compliance in And we're not particularly limited to Licenses that you know about right We can incorporate Proprietary licenses, I've struck a deal with XYZ systems to use something I have to pay them per seat so I don't want that in every application I want to have an allow list so only certain applications can use code that depends on that so I can make up my own condition for that and Say that well that condition Requires the allow list and it must be on a server. It can't be shipped on a An end user device because then I'm gonna pay a million seats We can think also of Constraints that are Imposed on us by ourselves. I've written some code that let's say would never possibly be GDR GDPR compliant And so I never wanted to apply that in the EU Well, I can make a license for that right? It's not really licensed But you can use the same mechanism and so compliance licenses internal not GDPR compliant It has a condition that says not deploy EU Now so imagine We've got a packaging system where I'm building a Debian package and one of the things we throw into the package is this List of all the licenses that we've used and their conditions and Now I distribute this package to all of my data centers And it hits the EU data center and we try to Install it there and one of the install checks on the machines go wait This code says not deploy you and I'm in EU We can fail there and so you could use the same kind of a license mechanism as part of your DevOps so that you could Restrict yourself from running code in the wrong place Because you would not be compliant with the rule you made up yourself And really when you think about that, that's no difference than complying with a license An illegal contract So what's the status of all this Google has been migrating to this over the last year We're about to sprint on getting it into basal or at least making it available The underlying code that makes this happen is there There's a public design that you could review and comment on that's being out for over a year and If you have questions so you can shoot me mail or catch me during the Q&A session so You know with that I thank you for being here today. I hope this was informative and Good luck and stay compliant. Bye